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Abstract 

fish is a fast and simple ideal magneto-hydrodynamics code that scales to ~ 10 000 processes for a Cartesian 
computational domain of ~ 1000 3 cells. The simplicity ofHSH has been achieved by the rigorous application 
of the operator splitting technique, while second order accuracy is maintained by the symmetric ordering 
of the operators. Between directional sweeps, the three-dimensional data is rotated in memory so that the 
sweep is always performed in a cache-efficient way along the direction of contiguous memory. Hence, the 
code only requires a one-dimensional description of the conservation equations to be solved. This approach 
also enable an elegant novel parallelisation of the code that is based on persistent communications with 
MPI for cubic domain decomposition on machines with distributed memory. This scheme is then combined 
with an additional OpenMP parallelisation of different sweeps that can take advantage of clusters of shared 
memory. We document the detailed implementation of a second order TVD advection scheme based on 
flux reconstruction. The magnetic fields are evolved by a constrained transport scheme. We show that the 
subtraction of a simple estimate of the hydrostatic gradient from the total gradients can significantly reduce 
the dissipation of the advection scheme in simulations of gravitationally bound hydrostatic objects. Through 
its simplicity and efficiency, fish is as well-suited for hydrodynamics classes as for large-scale astrophysical 
simulations on high-performance computer clusters. In preparation for the release of a public version, we 
demonstrate the performance of fish in a suite of astrophysically orientated test cases. 
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1. Introduction 

It has been argued that computational models provide the third pillar of scientific discovery, beside 
the traditional experiment and theory. On the practical side, there are indeed a lot of similarities between 
computer models and experiments: In both cases one tries to start from a well-defined initial state. The 
outcome of the simulation is in the same way unknown as the outcome of an experiment and unexpected 
results are interesting in both cases. Also, one has to define observables whose values one intends to measure 
at several time intervals or at the end of the simulation or experiment. If the computer model or experiment is 
complicated enough, these values will be affected by systematic and statistical errors and their understanding 
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requires an involved theoretical analysis and interpretation. Finally, both computer model and experiment 
need sufficient, and sometimes quite extensive, documentation to make the results reproducible. 

On the other hand, only the real experiment is capable of probing the properties of nature, while the 
computer model is bound to elaborate the properties of the theory it is built on. The strength of the computer 
simulation is that it may demonstrate and extend the predictive power of its underlying theory. In principle, 
though, it seems completely irrelevant whether the consequences of the underlying theory are evaluated by 
analytical approaches or a computer-aided model. The computer model can be regarded as the analytical 
evolution of a discretised model of the physics equations, where the computer only provides appreciated 
efficiency in the evaluations, but does not add anything new to an analytical investigation. 

This leads to our point of view that the computer model is not an independent pillar of science, but an 
increasingly important contemporary method in the development of theory. The computer model is most 
helpful if many different physics aspects are intertwined in a manner that prevents the treatment of the 
problem by dividing it into subproblems. Classical examples are found in research areas whose subject is 
inaccessible to human manipulation and preparation in the laboratory, like meteorology, planetary interiors, 
stars, and astrophysical dynamics in general. 

Because physics on many different length and time scales is involved, it is of primary importance to first 
perform order of magnitude estimates to identify irrelevant processes and to find equilibrium conditions that 
constrain degrees of freedom. A classical theoretical model is then built by the description of a sequence of 
dominant processes, supported by order of magnitude estimates. The next step in the traditional approach is 
to approximate the complicated microscopic input physics by simple analytical fitting formulas that depend 
on few parameters. If the fitting formulas are simple enough, the problem might become analytically solvable 
so that the dependence of the global features on the chosen parameters can be revealed. In stellar structure, 
for example, order of magnitude arguments show that the matter is in thermodynamic al equilibrium. Hence, 
an equation of state can be defined that expresses the gas pressure, p, as a function of the local density, p, 
temperature and composition. In some regimes, the microscopic physics information in the equation of state 
is well approximated by an equation of state of the form p — Kp l+1 ^ n . Here, k is a constant and n a parameter 
called the polytropic index. If this approximation holds, the equations of stellar structure can be integrated 
analytically for selected integer values of n Hence, traditional theoretical astrophysics relies on two 

prongs: (1) the interpretation of observations in terms of scenarios that are supported by order of magnitude 
estimates, and (2) the explanation of complicated processes by simple approximate laws that allow a precise 
analytical investigation of the most important aspects of the model. 

It is obvious how scientific computing can extend the lever of approach (2). As soon as the complicated 
processes are reduced to simple approximate laws, the initial state and the equations of evolution of the 
model are mathematically well-defined. It is therefore possible to replace the former analytical investigation 
by calculations on the computer. For example, the Lane-Emden equation can be integrated numerically for 
many more values of the polytropic index. There is only one mathematically correct answer to each problem 
and the numerical calculation can be refined until the algorithm obtains the desired precision of the result. 

It is much more intricate to let computer models contribute to approach (1), i.e. the interpretation of ob- 
servations in terms of scenarios that are supported by order of magnitude estimates. A well-known example 
is the propagation of a shock front. Only a minority of computer codes attempt to resolve the microscopic 
width of the shock to correctly treat its complicated input physics. Most codes just ensure the accurate 
conservation of mass, momentum and energy across the shock front to obtain the shock propagation speed 
and the thermodynamic conditions on both sides of the shock. Here, it is the numerical algorithm that 
dynamically performs the simplification of the model and not the description of the input physics like in 
approach (2). Hence, one may implement a rich set of input physics and design the finite differencing such 
that fundamental laws of physics are fulfilled under all possible conditions. Then, the complexity of the 
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model is bound by the scale on which unresolvable small-scale structures are dissipated in space and time. 
Because this scale depends on the numerical algorithm and the mesh topology rather than the investigated 
physics, different solutions of the same physics problem may not converge to the same microscopic solu- 
tion. However, the goal of approach (1) is not to evaluate the detailed microscopic state of ~ 10 9 individual 
fluid cells (for a human being it would take several years of uninterrupted reading to absorb this amount of 
information), but to elaborate the generic macroscopic properties that are determined by the fundamental 
physics laws in combination with the microscopic input physics. The actual microscopic state of the fluid 
cells acts as an exemplary sampling of a specific macroscopic state. Convergence of the evolution of the 
macroscopic state must be demonstrated by careful comparison and theoretical interpretation of concurrent 
implementations of the same physics problem using different algorithms and meshes. Hence, we believe that 
it is important for the reliability of astrophysical models to be investigated with several different radiation- 
magneto-hydrodynamics codes that are simple and efficient enough to be broadly used. 

There are several well-documented and publicly available software packages, gadget is a cosmological 
N-body code based on the Smoothed Particle Hydrodynamics method and parallelised using the Message 
Passing Interface (MPI). It is a Lagrangean approach with a hierarchical tree for the non-local evaluation 
of Newtonian gravity [3]. vh-1 is a multidimensional hydrodynamics code based on the Lagrangian remap 
version of the Piecewise Parabolic Method (PPM) [4] that has been further developed and parallelised using 
MPI. zeus-2D Q S 0] and zeus-3D are widely used grid-based hydrodynamics codes for which a MPI- 
parallel zeus-mp version exists as well. It offers the choice of different advection schemes on a fixed or 
moving orthogonal Eulerian mesh in a covariant description. While these more traditional approaches dis- 
tribute in the form of a software package that includes options to switch on or off, recent open source projects 
try to provide the codes in the form of a generic framework that can host a variety of different modules im- 
plementing different techniques. In this category we could mention the flash code [8, 9] with its main focus 
on the coupling of adaptive mesh stellar hydrodynamics to nuclear burning and the whisky code II 1 QTI as a 
recent general relativistic hydrodynamics code based on the cactus environment. Further we mention athena 
iflTll and enzo lfl2Tl which are both well documented and publicly available codes. 

In this paper we present our code fish (Fast and Simple Ideal magneto-Hydrodynamics), which follows 
a somewhat different strategy, fish is based on the publicly available serial version of a cosmological hydro- 
dynamics code 11311 . Its main virtue is the simplicity and the straightforward approach on a Cartesian mesh. 
It can equally well be used for educational purpose in hydrodynamics classes than for three-dimensional 
high-resolution astrophysical simulations 111411 . However, in the first code versions, simplicity and parallel 
efficiency were not available at the same time. Here we describe a new and elegant implementation of the 
parallelisation with a hybrid MPI/OpenMP approach that has been redesigned from scratch and is well sep- 
arated from the subroutines describing the physical conservation equations. Hence, the same code version 
is now simple and efficient on large high-performance computing clusters. In comparison to the above- 
mentioned software packages it is simple to modify and/or extend and has negligible overhead. With respect 
to earlier versions, the discretisation of the advection terms and the implementation of gravity has further 
been developed to make the fish code more accurate in the treatment of fast cold flows and more stable for 
the evolution of gravitationally bound hydrostatic objects. Special attention was given to the robustness of 
the approach in regions with poor resolution, which are difficult to avoid in global astrophysical models. 
fish has successfully been used in one of the first predictions of the gravitational wave signal from 3D su- 
pernova models with microscopic input physics [15] and provides the foundation for the implementation of 
spectral neutrino transport in our new code elephant (ELegant and Efficient Parallel Hydrodynamics with 
Approximate Neutrino Transport) [16]. 

In Section 2 we describe the physics equations and document the numerical methods used in fish. In Sec- 
tion 3 we discuss the optimisation of the memory access, the parallelisation of fish with MPI and OpenMP 
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and present the scaling behaviour to ~ 10000 processes. In Section 4 we finally investigate the performance 
of fish in a suit of six mostly magneto-hydrodynamic test problems. 



2. Numerical methods 

Solving the MHD equations numerically involves overcoming at least two challenges. Firstly the diffi- 
culty of solving the Riemann problem, used as building blocks for Godunov type shock capturing numerical 



schemes, due to the non-strict hyperbolicity of the MHD equations, see e.g. IU7U18U19I1 . The second prob- 
lem is the divergenceless constraint on the magnetic field. A non- vanishing divergence of the magnetic field 
can produce an acceleration of the magnetised fluid parallel to the field lines l2(ill . 

The algorithm of Pen et al. [13] solves the first problem by using a Riemann solver free relaxation 
method Bill . The second issue is addressed by using a constrained transport method ll22ll on a staggered 
grid. 

The algorithm of Pen et al. makes extensive use of operator splitting. In particular, the evolution of the 
conducting fluid and the magnetic field are split by first holding the magnetic field constant and evolving 
the fluid and then performing the reverse action. The source terms arising due to gravity are also accounted 
for in the operator splitting method. Further, the full three dimensional problem is dimensionally split into 
one-dimensional subproblems. An adequately ordered application of the solution operators permits second 



order accuracy in time B23I1 . 



2.1. The equations of ideal magnetohydrodynamics 

The equations of ideal magnetohydrodynamics (MHD) describe the movement of a compressible con- 
ducting fluid subject to magnetic fields. In ideal MHD all dissipative processes are neglected, meaning that 
the fluid possesses no viscosity and its conductivity is assumed to be infinite. The ideal MHD equations then 
read Q 



dt 

- + V • (vnv 

dt 

dt 



<>p +V-(pv) = (1) 



+ V • (vpv - bb) + VP tot = -pV<p (2) 

8E 

-+V.[(£ + PJv-»(v»)] = -pv-V0 (3) 



v-Vx(vxJ) = 0, (4) 
dt 

expressing the conservation of mass, momentum, energy and magnetic flux, respectively. Here p is the mass 
density, v the velocity and E — pe + jv 2 + y the total energy being, the sum of internal, kinetic and magnetic 
energy. The magnetic field is given by B = / y/Anb and P m = p + K is the total pressure, being the sum of 
the gas pressure and the magnetic pressure. For the equation of state we assume an ideal gas law 

p=pe(y-l) (5) 

where y is the ratio of specific heats. The right hand side of the momentum and energy conservation equa- 
tions detail the effect of gravitational forces onto the conserved variables. The gravitational potential <p is 
determined by the Poisson equation 

V 2 = 4np. (6) 
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The MHD equations (Q]|4]l conserve the divergence of the magnetic field so that an initial condition 

V • b = (7) 

remains true, consistent with the physical observation that magnetic monopoles have never been observed. 

Before we start describing the individual solution operators, we first introduce our notation. We discretise 
time into discrete steps Af" and space into finite volumes or cells Vij t k where n labels the different time levels 

and the triple (i,j,k) denotes a particular cell. The vector u - (p,pv x ,pv y ,pv z ,E} denotes the conserved 
fluid variables. The solution vector u" . k contains the spatially averaged values of the conserved variables at 
time f in cell Vujt, 

Uij,k = 77— f u (x, t) dxdydz, (8) 

where the cell volume Vfn - AxAyAz is given by the assumed constant cell dimensions Ax = x,< - Xt-\, 
Ay = y,y — yr-\, Az = zj? - Half-integer indices are indicated by a prime i' = i + 1/2, / = j + 1/2, 
k! — k + 1/2 and denote the intercell boundary. Further we define the cell face averaged magnetic field 
components at time f by 

Q>x),j* = -r— f b x (x,t)dydz (9) 

( b y) ■^=-r— f *y(*.0d*dz (10) 

=c^- f b zix,f)dxdy (11) 

where = AyAz denotes the cell face of cell Vj,a located at xp and spanned by the zone increments Ay 
and Az. {b^. ., ^ and (b z )"j k , are defined analogously. 

In an operator-split scheme the solution algorithm to the ideal MHD equations can be summarized as 

U — ^forward ^backward^ , (12) 

where 

forward = L x (At) Bf (At) L y (At) Bf (At) L z (At) Bf (At) 

backward = U (At) B? (At) L y (At) Bf (At) L x (At) Bf (At) ' ' ^ ' 

are the forward and backward operator for one time step. The operators L w evolve the fluid and account for 
the source terms, while the B operators evolve the magnetic field. If the individual operators are second order 
accurate, then the application of the forward followed by the backward operator is second order accurate in 
time [ 23 ] . In the following subsections we shall detail the individual operators. 

The numerical solution algorithm to the MHD equations is explicit. Hence we are restricted by the 
Courant, Friedrich and Lewy B25I1 (CFL) condition. Therefore we impose the following time step 



Af" = k ■ min 

i,j,k 



Ax Ay Az 



(14) 



where 

is the maximum speed at which information can travel in the whole computational domain in direction 
d = x,y,z being the sum of the velocity component in d and the speed of the fast magnetosonic waves Cf . 
We typically set the CFL number k to 0.75. 
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2.2. Solving the fluid MHD equations 

In this subsection we describe the evolution of the fluid variables u in the x-direction. During this 
process the magnetic field is held constant and interpolated to cell centers. The gravitational source terms 
are assumed to vanish for the moment. Then mass, momentum and energy conservation in x direction can 
be written as 



where 



— + — - 

dt dx 



pv\ + P tot - b\ 

pV X Vy ~ b x by 

pv x v z - b x b- 
(E + P, ot )v x - b x b 



(16) 



(17) 



is the flux vector. 

Integrating eq. ( fTot over a cell V, ^ gives 



dUj,j,k 
dt 







(18) 



where the definition of the cell averaged values ([Sj has been substituted and Gauss' theorem has been used. 
The numerical flux Ffj^ represents an average flux of the conserved quantities through the surface S i>j^ 



Ft 



F(x,t)dydz 



(19) 



at given time t. Eq. ( fTSI ) is a semi-discrete conservative scheme for the conservation law ( fl~6b . In the 
following we focus on obtaining the numerical fluxes in a stable and accurate manner. Time integration of 
the ordinary differential equation ( fT8l will be addressed later in this subsection. 

Many schemes for the stable and accurate computation of the numerical fluxes have been devised in the 
literature. Godunov t ype methods achieve this by solving either exact or approximate Riemann problems 
at cell interfaces |26[ 27, 28]. Through solving the Riemann problem, these methods ensure an upwind 
discretisation of the conservation law and hence achieve causal consistency. Due to the already mentioned 
difficulty of solving the Riemann problem in the ideal MHD case, the algorithm of Pen at el. 11311 uses the 
relaxation scheme of Jin and Xin lf2lll . For detailed information on these type of methods we refer to 1 21 , 2^] 
and the references therein. 

The idea of the relaxation scheme is to replace a system like ( TTTT i by a larger system 



du dw 
~dt + ~dx~ 



dw 
~dt 



+ D 



-,du 1 

2 — =-(F(K)-W>), 

ox e 



(20) 



called the relaxation system. Here, the relaxation rate e is a small positive parameter and D 2 is a positive 
definite matrix. For small relaxation rates, system (l20b rapidly relaxes to the local equilibrium defined by 
w = F(u). A necessary condition for solutions of the relaxation system (l20b to converge in the small e limit 
to solutions of the original system dT6b is that the characteristic speeds of the hyperbolic part of (1201 ) are at 
least as large or larger than the characteristic speeds in system ( fT6] l. This is the so-called subcharacteristic 
condition. 
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As in Ref. 02111 we choose D — d • I to be a diagonal matrix. In order to fulfill the subcharacteristic 
condition the diagonal element d or the so-called freezing speed is chosen to be 



d = \v x \ + c F , (21) 

where c F is the speed of the fast magnetosonic waves, i.e. the fastest wave propagation speed supported by 
the equations of ideal MHD. 

The key point in the relaxation system is that in the local equilibrium limit it has a very simple charac- 
teristic structure 

d d 

— (w + Du) + D— (w + Du) = 

dt dx (22) 

— (w - Du) - D— (w - Du) = 0, 

dt dx 

where w + Du are then the characteristic variables. They travel with the "frozen" characteristic speeds +D 
respectively. 

System d22l can be easily recast into an equation for u and w. However, we are practically only interested 
in that for u 

du + dF^ + dF2_ Q (23) 
dt dx dx 

where F + = (w + Du)/2 denotes the right travelling waves and F = (w - Du)/2 the left travelling waves. 
In the follwing we shall drop the indices of the ignored directions. Since this defines an upwind direction 
for each wave component, a first order upwind scheme results from choosing Ff, = Ff and FJ, = FJ +1 . In 
this case, the total flux at the cell interfaces is readily evaluated to become 

F t = F+ + FT, = I (F; + F M ) - ~Z> (u M - u t ) (24) 

where F; = w, = F(m,). For D we use the freezing speed 

d = max (dj, di+\) (25) 

in order to satisfy the subcharacteristic condition. We note that this choice for the freezing speed makes the 
numerical flux equivalent to the Rusanov flux and the local Lax-Friedrichs flux. As pointed out in 112911 . a 
wide variety of numerical flux assignments can be derived from the relaxation system by simply letting the 
matrix D having a more complicated form than diagonal. 

So far, the numerical flux (l24l > is only first order accurate. First order methods permit the automatic 
capturing of flow discontinuities but are inaccurate in smooth flow regions due to the large amount of nu- 
merical dissipation inherent to them. As a matter of fact, the large numerical dissipation present in first order 
methods is not a deficit of theses methods but it is the reason why they are stable at flow discontinuities in 
the first place. However, in many applications both smooth and discontinuous flow features are present and 
therefore the use of higher order methods is desirable. We opt for a second order accurate total variation 
diminishing (TVD) scheme due to the low computational cost and the robustness of these type of schemes. 

Let us first consider the right traveling waves F + . Given the z'th cell, a first order accurate flux at the cell 
boundary Xf is then given by Ff, — F+ = F + (m,)- This corresponds to a piece-wise constant approximation 
of the flux function F + (x, t) over the staggered cell [jt;, Jty+i]. For second order accuracy we seek a piece-wise 
linear approximation 

dF + 

F + (x,t)*Fl + 



dx 



(x - Xj) (26) 
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where the derivative may be approximated from first order flux differences. Two choices exist: either left or 
right differences 

' aft* = (f;-fi 1 )/ax 

(27) 

AF+- R = (Ff +1 -Fl)/Ax 



dF + 

dx 



To choose between the left AFt' L and right AF + ' R differences a flux limiter < 



AFf = (f> (AF*' L , AF+' r ) 



(28) 



is used. 

This limiter enforces a nonlinear stability constraint commonly known as TVD to ensure the stability of 
the scheme in the vicinity of discontinuities. The limiter reduces spurious oscillations associated with higher 
accuracy than first order to get a high resolution method. See for example |27, 3(1 18, 28] and references 
therein. 

We have implemented the minmod limiter 

(f>(a,b) = minmod(a,fr) = i (sign(a) + sign(b))min(|a|, \b\) (29) 

which chooses the smallest absolute difference if both have the same sign, and the Van Leer limiter 

cf>(a, b) = i (sign(a) + sign(fc)) ^— . (30) 
2 a + b 

Other choices are possible for the scheme to be TVD [21]. Note that when the left and right flux differences 
have different signs, i.e. at extrema and hence also at shocks, no correction is added in d26l ) and the scheme 
switches to first order accuracy. For core-collapse simulations we use the Van Leer limiter in the subsonic 
flow regions and the minmod limiter in supersonic regions. 

In a similar way we may construct a piece-wise linear approximation for the left going fluxes F in the 
staggered cell x i+1 ] starting at 



F - ix , t)aF7+i+ d JL 



(X - Xi+\) 



(31) 



8F_ 

dx 



with either the left or right differences 

' AF7;t = (F- 1 -Fr)/A^ 

, AF7f = (F- 2 -Fr +1 )/A* 
Again the flux limiter (f> is used to discriminate between the left or right differences 

AF- +1 =0(AF-f,AF-f). 
The total second order accurate numerical flux is then simply 

F,=Ft + FJ +l + ^(AFt-AFJ +1 ). 



(32) 



(33) 



(34) 
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uT l — u" — — — (Ff, - F'}',_ x ) . (35) 



For the time integration of eq. ( TT8l > we use a two step predictor-corrector method. As predictor we 
compute a half time step with the first order fluxes (l24l >. We regard the freezing speed in the predictor step 
as a parameter varying between d — and d = max (di, dt+{) to regulate the numerical dissipation. Hence we 
vary the predictor between a first order scheme and a second order centered difference scheme depending on 
the application. 

In the corrector step we then use the calculated values from the predictor step u" to compute the second 
order TVD fluxes eq. <T34t : 

Af 
Ax 

Hence we obtain a second order update in time and space of the fluid variables. This ends the description of 
the L x solution operator. The other spatial directions are treated in the same way. 

2.3. Incorporation of gravitation 

Gravitational forces play an important role in most astrophysical processes. A code devoted to the 
simulation of these processes should therefore include gravity and we describe our implementation in the 
following subsection. To integrate the source terms, we split the source term dimensionally as follows: 

du 3F 

¥ + aT 5 * (36) 

where S x = (0, -pd^/dx, 0, 0, -pvj)<pldx), and in an analogous manner for the y and z direction. In the 
following we shall regard the gravitational potential as given and constant in time. For the integration of eq. 
d36*] > one then has two possibilities, either an operator split or an unsplit method. 

In the fully operator split version, the evolution of the conserved variables is divided into a homogeneous 
system (S x = 0) and the ordinary differential equation 

du 

-T = S x (37) 
at 

Solving the homogeneous part has been discussed in the previous section. In operator notation, we then 
solve equation (l36l l as 

" n+1 = Gf (f)^ (Af)Gt (y) (38) 

which is second order accurate in time. 

Further, for reasons to be explained in section [3] our code has as fundamental variables density, mo- 
mentum and internal energy. Therefore we only update the momentum field, and the operator G x is then 
explicitly 



G x (Af) : (pv)'; +l = (pv)'; -M-p1 1 ^ I , (39) 



where we use centered differences for the gravitational potential 



dx , 2Ax 



(40) 



Note that the density field is left constant according to ( 137) 1. The total energy is computed by summing up 
internal energy, magnetic energy and kinetic energy and given as input to the L x operator. Therefore the total 
energy change due to the source term is implied from the updated momentum field. 
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As a second possibility, we implemented an unsplit version. There we directly account for gravity in the 
fluid predictor/corrector steps. The predictor step then is given by 



where the F" is the predictor flux as described in previous section and 



(41) 



S" 



The corrector step is then 





P'l 



At 



01+1 - 

2Ax 



(42) 



<-^«-*ti) + A*Sf, 



(43) 



where the fluid fluxes are given in previous section and 5" is analogous to j42l . However, the density and 
the momentum are then given by the predictor step n' . 

Both implementations of the source terms are second order accurate in space and time. 

Astrophysical simulations often include gravitationally bound objects that can be in hydrostatic equilib- 
rium (HSE). With the described discretisation of the fluxes, the gradients due to the hydrostatic stratification 
are interpreted by the fluid scheme as a propagating wave and the scheme invokes numerical dissipation. In 
HSE, however, the gradients are not the result of a propagating wave but rather a consequence of gravity 
which is stationary. In order to subtract the gravitationally induced gradients from the algorithm that tailors 
the dissipation in the TVD scheme, we try to find an analytical estimate of the gradients present in HSE. 

In the following we neglect the influence of magnetic fields and are considering only gas pressure sup- 
ported equilibrium. In HSE, the pressure variation is then given by 



1 dp 
p dx 



d0 
dx 



(44) 



Furthermore we neglect entropy and composition gradients. The later assumption is needed for more com- 
plex equations of state p = p(p, s, Y) where Y denotes a vector of chemical compositions. Under these 
assumptions, we can express the pressure gradient as 



dp=(dp\ dp^^dp 
dx \ dp J s Y dx s dx ' 

where c$ is the speed of sound. Then the density gradient in HSE can be expressed with (l44b as 



dp 

dx 



4 dx ' 



(45) 



(46) 



If we now relax the HSE condition by allowing the velocity to be non zero but roughly constant across a few 
cells of investigation, we can rewrite the momentum gradient by using the continuity equation 



dpv x _ pv x d(f> 
dx c 2 s dx 



(47) 
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Similar equations can be derived for the y and z components of the momentum gradient in x direction. Using 
the fundamental thermodynamic relation de = -pdV an energy gradient can be derived as well 



dE 
dx 



c\ dx 



In summary, we then have the following gradients in HSE 



du 

dx 



where 



cj dx' 



P 

pV x 

pVy 

E+p 



(48) 



(49) 



(50) 



This estimate of the gradients in HSE are now subtracted from the difference k !+ i - that generates the 
numerical dissipation when it is multiplied by the freezing speed. First we multiply d49l by dx and discretise 
the result as 

i ft- 
Si = ~ — ■ 

csj ■ c S j+\ 2 



- 4>d- 



(51) 



This approximation is more in the spirit of finite differences than finite volume methods. Nevertheless, since 
we only seek second order accuracy we did not explore more sophisticated methods. Then we define 



Am, = Ui+i — h, 

Sit, = minmod(Aii,- - g b Am,) . 



(52) 
(53) 



The minmod function prevents 6u, of becoming antidissipative. The HSE corrected difference is then used 
to modify the first order upwind flux ( l24t as 



where 



i (Fj + F i+ i) - -DadvAM; - ^D dC 6uj 



D. dAv = max(v A%; , v A - /+ i)7 
D dC =max(cs,i,c s ,i+i)I. 



(54) 



(55) 
(56) 



Hence for the advective part of the numerical dissipation we use the full difference Ah,- while for the acoustic 
part we use the HSE corrected difference 8uj. The second order fluxes are constructed in the same way as 
described in section l2~2l but with the HSE corrected first order fluxes. 

We note that the previous description of HSE is only directly applicable for purely hydrodynamic simu- 
lations. However, we have found that by replacing the acoustic speed by the fast magnetosonic speed 



D. dC = max (c/y, c FJ+ i) I 
works in a stable manner for MHD simulations as well. 



(57) 
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For completeness we outline how the HSE correction could be applied more generally to other flux 
assignment methods in the context of purely hydrodynamic simulations, e.g. the method of Roe. In that 
method, the intercell flux is given in the form 

1 1 5 

Ft = 2 ( f i + F + i) - ^ J] a Mp\ K P> ( 58 > 
p=i 

where the A p are the eigenvalues and the K p are the right eigenvectors of the Jacobian matrix of the flux 
function dF/du. The eigenvalues are A\ = v x — c$, ^a,3A ~ v * an d ^5 = v x + c s- The eigenvectors can be 
found, for example, in II28I1 . The coefficients a p are obtained by projecting the differences in the conserved 
variables onto the right eigenvectors: 

5 

bu i = Y A a P K P - (59) 
p=i 

The HSE correction can now be used to modify the projection of the acoustic modes 1 and 5. Instead of 
projecting the acoustic modes onto Am,- , one can project them onto the HSE corrected differences Suj. The 
advective modes 2,3 and 4 remain identical. 

We mention that Ref. 13111 derived a simple "modified states" version of the PPM method with the same 
aim to improve the numerical solution in HSE. However, their method is only directly applicable to the PPM 
method. 

Alternatively, the gravitational source terms can be completely avoided by using a fully conservative 
form of the hydrodynamics equations (see (49J). 



2.4. Advection of the magnetic field 

In this subsection we describe the advection of the magnetic field in x-direction B^. The operators for 
the update in y and z directions B* : and Bf are handled analogously. During this operator split update, all 
quantities other than the magnetic field are held constant. The update is then prescribed by the induction 
equation 

v-Vx(vxJ)=0. (60) 
at 

Straight forward discretisation of (l60l i can only guarantee that V ■ b = is of the order of the truncation 
error. However, at flow discontinuities, the discrete divergence may become large. As a consequence, large 
errors in the simulation can accumulate ll32ll . Therefore care has to be taken. A variety of methods have 
been proposed to surmount this difficulty, see e.g. 11221 l33l 34 l35l 13611 . The algorithm of Pen et. al and 
hence our code, uses the constrained transport method 

The key idea of the constrained transport method is to write the induction equation in integral form. 
Integrating eq. d60i l. for example, over the surface S? of cell V^jja substituting definition (O and using 
Stoke's theorem yields 

i(Px)fj*= f vxbdx (61) 

where dSfjjt denotes the contour of Spjjc, i.e. the edges of the cell-face at i' , The integral form then 
naturally suggests one to choose the normal projections of the magnetic field at faces of the cell V, ^ and 
the normal projections of the electric field E — v x b at the cell edges as primary variables. This positioning 
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leads directly to the jump conditions of electric and magnetic fields H37I1 and therefore mimics Maxwells 
equations at the discrete level. The discrete form of the V ■ b = constraint is then defined as 



Ax 



Ay 



Az 



(62) 



A detailed inspection of the characteristic structure of the induction equation reveals the presence of two 
transport modes and one constraint mode. As pointed out by Pen et al. 01311 . this allows one to separate 
the evolution of the induction equation into advection and constraint steps. In x-direction, for example, this 
means that the y and z components of the magnetic field need to be updated as 



d 1 



(63) 



Ax 

However, the x component of the magnetic field has to be updated as 



at 



A* 



1 

Ay" 



- -J- [(yJJrjj, - (vA)i<,/,*<-i] • 



(64) 



The fluxes in eq. d63l need to be upwinded for stability, since they represent the two advection modes. To 
maintain V • b — within machine precision, the same fluxes used to update b x and b z in eq. d63l need to be 
used in eq. (l64l for the b x update. A simple calculation then clearly shows that <9/<9f(V • b) = 0. 

To update (by). we then proceed as follows. Note that the velocity v x needs to be interpolated to the 

same location as (b y ). ., ^, i.e. to cell face Sijy. 



(Vx)i,j',k = 



(Vx)i,j,k + (y.x)i,i+l,k 



(65) 



This simple interpolation is, however, not numerically stable due to the negligence of causal consistency (or 
equivalently numerical dissipation). It turns out, that for stability a simple averaging in x direction 



(y x \ f , k = i [(v x ) i+lJM + 2 (y x \ n + (v x )^ 1Jik ] 
introduces enough numerical dissipation [13]. A first order accurate upwinded flux is then given by 

yxby) , (Vx)i',f,k>0 

(vA),. +1J , , . (v,),j,k < 



where the velocity average is 



(vx)i',r,k = X - [(y x )i,r,k + (vx) i+ ij>,k\ ■ 



(66) 



(67) 



(68) 



Second order accuracy in space and time is achieved in the same fashion as for the fluid updates: a first order 
predictor combined with a second order corrector with a piece-wise linear TVD approximation to the fluxes. 
This ends the description of the \ b y j. k component of the magnetic field. The update of the (& z ) ( - - jk , follows 
the same strategy. 
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2.5. Generalisation to non-uniform meshes 

Finite volume methods can be constructed for non-uniform meshes in a straightforward manner. The 
only quantities affected by the non-uniform mesh are the volumes of the computational cells, their bounding 
surfaces as well as the cell edge lengths. 

In the current version of the code, we have implemented irregular Cartesian meshes. Then the mesh 
increments Ax,- = x? - Xy-\, Ayj = yy - yy-i, Azk = zw — Zf-i are no longer constant for the respective 
direction and equation (fT~ST > then changes to 

_^. + _(F^-F f _ ut ) = 0. (69) 



For more general coordinates see for example [3j 

The update formulas for irregular Cartesian meshes are then simply obtained by substituting Ax by Ax,- 
adequately in all the previous sections. Analogously Ay by Ay j and Az by Azk- Furthermore, the velocity 
interpolation in the magnetic field advection needs also to conform with the non-equidistant spacing of 
the cell centers. However, the velocity averaging (l66l > is left unchanged since its primary goal is not an 
interpolation but an inclusion of the stabilising effect of numerical dissipation. Finally, the CFL condition 
eq. (fT4l needs to be adapted, 



At" = fcmin 



AXi A yj Az k 



^i,j,k ^i,j,k ^UJcJ 



for all i, j, k. (70) 



3. Parallelisation and performance 

The implementation of the above-described simple algorithms uses the directional operator splitting 
in a peculiar way: Instead of the traditional approach to hold the data locations fixed in memory while 
sweeping updates in x-, y-, and z-directions, we rearrange the data in the memory between the sweeps so 
that different directional sweeps always occur along the contiguous direction of the data in the memory 11311 . 
This has the advantage that the data load and store operations are very efficient for the large arrays that 
contain the three-dimensional data and that only one one-dimensional subroutine is required per operator 
split physics ingredient to perform the corresponding data update in the sweep. The disadvantage is the 
additional compute load to rearrange the data (of order 10% of the total CPU-time) and the complications 
the rearranged data can cause if the code needs to import or export oriented data between the sweeps (e.g. 
for debugging). 

The most convenient operation to rearrange the data in the desired way is a rotation with angle 2n/3 
about the axis threading the origin and point (1,1,1). Each rotation aligns another original coordinate axis 
with the current x-direction, without changing any relative quantities between data points or the parity of the 
system. Three consequential rotations lead back to the original state. But how does this procedure interfere 
with the parallelisation? After having applied a cuboidal domain decomposition for distributed memory 
architectures, one realises that it is not meaningful to rotate the whole domain about the same rotation axis, 
because this would invoke excessive data communication among the processes. It is sufficient to to rotate 
all cuboids individually about the axis defined in their local reference frame. These local rotations do not 
require any data communication. 

However, data communication is required across the interfaces of the distributed data to calculate the 
advection terms in Eq. (1181 1. the gradient of the gravitational potential ( l40b . or to interpolate the velocities 
in the update of the magnetic field to the zone edges in Eq. (l65l l and (1661 1. In the following, we describe our 
approach to the parallelisation in two dimensions, its extension to the third dimension is straightforward. 
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Figure 1: The interaction between sweeps and rotations is illustrated for a process A with its four neighbouring processes B-E. The 
distributed data stored and treated in process A carries a small permanent buffer area (light shading) around all of its interfaces. An 
additional volatile buffer (dark shading) is added in the current .v-direction. The ,t-sweep is performed horizontally. For the y-sweep, 
the data and the permanent buffer are rotated clockwise by 90 degrees such that the y-sweep can also be performed in the horizontal 
direction. 



FigureQ]shows the computational domain of a process A after the cuboidal domain decomposition. The 
data stored and updated in process A is surrounded by a small permanent buffer zone (light shading). In the 
current x-direction, there is an additional volatile buffer zone (dark shading). At the beginning of a time step, 
the overlapping data from process B is communicated to process A in order to update the buffer designated 
by 'left'. The overlapping data communicated from process C fills the buffer designated by 'right'. During 
the communication, the horizontal sweep in x-direction can already start to work on the interior zones that 
don't require the communicated buffer zones. Once all data has arrived, the x-sweep can be completed so 
that all zones in domain A and the permanent buffer is up to date. Hence communication is overlapped with 
computation. 

For the sweep in y-direction, all local data in Figure Q] are rotated clockwise by 90 degrees so that the 
y-direction becomes horizontal. This time it is the data of process E that need to be communicated to the 
buffer zones designated by 'left' and the data of process D that need to be communicated to the buffer zones 
designated by 'right' . Again, the sweep can start with the inner zones and update the border zones once the 
communications have completed. The important point is to realise that the >'-sweep is now also applied in 
the horizontal direction so that the one-dimensional subroutines used for the x-sweep can be used without 
any modifications. In the three-dimensional code, the procedure is repeated a third time for the sweeps in 
z-direction. Finally, all three sweeps are once more applied in reverse order to obtain second order accuracy. 
This ordering of the sweeps requires four rotations between the six directional sweeps. 

Hence, the communication pattern is the only action that needs to be performed differently during the 
different sweeps: For the x-sweep we need to copy B — > A (left) and C — > A (right ), while for the y-sweep 
we need to copy E — > A (left) and D — > A (right). This distinction can easily be implemented using 
persistent communications in the MPI. When the code is initialised, the communications B — > A (left) 
and C — > A (right )are defined to be invoked by a handle h x , while the communications E — > A (left) 
and D — > A (right)are defined to be invoked by a handle h y . With this preparation it is straightforward 
to perform the x-sweep by calling a function sweep f/z A , n x , n y ,A, f) and the y-sweep by calling the same 

function sweep \h y , n y , n x ,A', f), where n x and n y specify the dimensions of A, where A' is the rotated state 
of A and where £ points to a one-dimensional function that implements the physical equations treated by the 
sweep. In this way, the setup of the communication topology during the initialisation of the code, the trigger 
of communications and the dimensional index gymnastics in the sweep subroutine, and the implementation 
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of the physics in the function f are all very well disentangled. 

The repeated evaluation of the physics equations in the function £ amounts to the dominant contribution 
to the total CPU-time. Because the sweeps are now always performed along contiguous memory, it is 
possible to pipeline the physics quantities in the cache so that the access of the large data arrays is reduced 
to a minimum. The first order predictor and second order corrector are evaluated according to the following 
scheme: 

loop over cells i in x-direction 
u3 = u4 
u4 = u5 
u5 = u6 
u6 = u(i) 
if i<3 cycle 
mil = uu2 
mi2 = mi3 
uu3 = mi4 
mi4 = uu5 

uu5 = u5 + rate(u4,u5,u6)*0.5*dt Ifirst order 
if i<7 cycle 

u(i-3) = u3 + rate (mil, mi2,mi3,mi4,mi5)*dt (second order 
end loop over cells 

Here, u(i) is the state vector with the conserved variables, which is only involved twice per time step dt, once 
for data retrieval and once for data storage. The hydrodynamics equations of Eq. d36*b and their discretisation 
described in Eqs. (f4TT > and d43l are abbreviated here by the evaluation of rate(). The whole operation has 
a stencil of 7 cells and will lead to 3 unassigned cells at each end of the array u. Hence, the buffer in Fig. Q] 
has to be large enough to host the unassigned cells. 

Note that the horizontal sweep in x-direction is performed independently for all n y rows of the data in 
process A. Hence it is straightforward to further parallelise the loop over the rows i y = 1 . . . «/ with OpenMP, 
where n\ = n y + 2n,b p is the dimension of A including the permanent buffer of width n\, p on either side. This 
one OpenMP parallel section suffices to parallelise over 90% of the workload of the code for shared memory 
nodes. Hence, the above-described approach naturally leads to a hybrid parallelisation, where MPI is used 
to distribute the memory by the cuboidal domain decomposition across different nodes or processors, while 
OpenMP is used to parallelise the loop over the rows along the current sweep direction on the cores that are 
available to each node or processor. 

We evaluated the strong scaling of fish on the new ROSA system (Cray XT-5, nodes with 2 quad-core 
AMD Opteron 2.4 GHz Shanghai processors, SeaStar 2.2 communications processor with 2 GBytes/s of 
injection bandwidth per node) at the Swiss National Supercomputer Center (CSCS). The dominant limita- 
tion to the scaling of fish is the rather large stencil that emerges from the combination of the MHD and 
hydrodynamics solver with a first and second order step in a single sweep, fish scales without problem if the 
problem size is increased with the increased number of processors. More interesting is the case of a fixed 
size problem. Figure|2]shows the strong scaling for a problem with 600 3 cells. As the number of processors 
is increased, the ratio of buffer zones to volume zones increases as well. In this case it is the evaluation of the 
physics equations on the buffer zones that limit the efficiency. However, Fig. [2] also shows that fish scales 
very nicely to of order 10000 processors in the hybrid MPI/OpenMP mode where MPI is used between 
nodes and OpenMP within the node. This scaling can be achieved because the parallelisation with OpenMP 
does not increase the number of buffer zones. A version of fish with smaller stencils is under development. 
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Figure 2: Strong scaling of the fish code. The speedup, normalised to 216 processes, is shown on the left hand side. The efficiency is 
displayed on the right hand side. The dashed lines with data points taken at the circles refer to a parallelisation that uses only MPI. The 
solid lines with data points taken at the crosses refer to a hybrid parallelisation with MPI between nodes and OpenMP within nodes. 
The problem size was kept constant at 600 3 cells. The deviation from perfect scaling is rather due to the increase of work on buffer 
zones with respect to the work on volume zones than a bottleneck in the communication. 

4. Numerical results 

In this section we verify our code by performing several multidimensional test simulations of astro- 
physical interest. The algorithm has already been tested for second order accuracy in 1 1 3 .1 and we do not 
repeat that here. Unless otherwise stated, we use periodic boundary conditions for all test problems. The 
simulations are stopped before any interaction due to the periodic boundary can occur. 

4.1. Circularly polarised Alfven waves 

Our first test problem involves the propagation of circularly polarised Alfen waves. These waves are 
a smooth exact non-linear solution to the equations of ideal MHD B24fl and represent therefore an ideal 
problem to test the convergence of numerical methods. 

We have used the same initial conditions 13611 . Circularly polarised Alfven waves are propagated the 
x - y plane at an angle of a — 30° with respect to the jt-axis. The domain is set to [0, 1 / cos a] X [0, 1 / sin a] 
and discretised with N x N cells where N = 32,64, 128,256,512. The adiabatic index is set to y — 5/3. 
Then we have p = 1, vy = 0, p = 0.1, b\\ = 1, v± = b ± = A sin[27r(xcos a + ysina)] and v z = b z — 
A cos[27r(jccos a + ysina)]. The subscript || and ± denote the components parallel and perpendicular to the 
wave propagation, respectively. We have set the wave amplitude to A = 0.1. With these parameters the 
Alfven velocity is B\\p = 1 and therefore the wave profile is advected to its initial position after a simulation 
time t — 1 . 

In figure [3] are shown the numerical errors after the wave profile crossed the domain once t = 1. The 
error is given by the LI norm of the difference between the initial conditions and the numerical solution at 
time t — 1 over all cells: 

•J 

To get the absolute numerical error we have summed over all components of 6u. The scheme shows second 
order accuracy as can be inferred from figure [3] 

4.2. Sedov-Taylor blast wave 

The Sedov-Taylor blastwave test is a purely hydrodynamical test involving a strong spherically sym- 
metric outward propagating shock wave. An analytical self-similar solution can be found for example in 
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1 39l 40h . We use an adiabatic index of y = 5/3. The problem is solved on a cubic domain (x,y,z) e [0, l] 3 
with 256 3 computational cells and equidistant mesh spacing. 

The density is set to unity and the velocity to zero throughout the whole domain. A huge amount of 
internal energy e — 3 x 10 3 is placed uniformly in a small spherical region of radius 0.01 at the center of the 
domain. Outside the spherical region the energy is set to e — 1 x 10~ 3 . The energy amount concentrated at the 
center then starts a strong shock wave which propagates spherically outward from the center of the domain. 
The simulation was stopped at time t = 1.7419 x 10~ 4 when the shock has propagated to a distance r as 0.45 
from the center. In figure |4] a random subset of cells (blue points) and a radial average of the numerical 
solution (green) are compared against the exact solution (red). All the values have been normalized to the 
exact postshock value. The postshock values of the numerical solution are lower but come close to the exact 
solution. Despite we have a dimensionally split code, the scattering of the points at the shock is not dramatic 
when compared to the Cartesian grid size of Ax — Ay — Az ~ 0.004. In figure 0a shaded surface plot is 
shown for the density in the x,y plane with z - 1/2. As seen in the figure, the shock looks spherical and no 
serious symmetry breaking due to the dimensionally split character of the code can be seen. 




Figure 3: Convergence for circularly polarised Alfven waves. The points are the absolute numerical error as function of the number of 
grid points. The solid line is a reference second order rate. 
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Figure 4: Radial distribution of the Sedov-Taylor blast wave test. The blue points represent a random subset of all the cells, the green 
line is a radial average of the numerical solution and the red line is the exact solution. All values have been normalized to the exact 
postshock values. The ticks on the r axis have the size of the mesh spacing Aa" and the shock is resolved within roughly 2Sx. 
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4.3. Spherical Riemann problem 

This problem is a spherical setup of Sod's shocktube and was suggested in [28Q as a test for multidi- 
mensional hydrodynamics codes. The solution is computed on a cubic domain (x,y,z) e [0, l] 3 with 256 3 
computational cells and equidistant mesh spacing. Once again, an adiabatic index y — 5/3 is used. 

The domain is decomposed into a spherical region of radius R = 0.25 at the center of the domain and 
the remaining exterior region. The fluid variables are set to constant values in each region. At the boundary 
between both regions a circular discontinuity results. In both regions, the velocity is set to zero. In the inner 
region we set the density p = 0.125 and the pressure p =0.1. In the outer region, we set the density p — 1 
and the pressure p — 1. The simulation is stopped at time t - 0.09. 

In figure[6]the density distribution at the end of the simulation is displayed. The blue points are a random 
subset of cells, the green line is a radial average of the numerical solution and the red line is a highly resolved 
numerical solution from a ID spherically symmetric code based on the same algorithm as the 3D one. The 
shock, the contact discontinuity and the rarefaction wave are well captured by the 3D code. The scattering 
of the points is strongest at the inward propagating shock. In figure [7] a shaded surface plot of the density 
in the x,y plane with z = 1/2 is pictured. The spherical character of the solution is well retained and the 
dimensional splitting does not degrade this symmetry. 



4.4. Magnetic explosion 

This test is based on the same idea as the Sedov-Taylorblastwave problem with the addition of a magnetic 
field We used the same domain, number of cells and adiabatic index. The density is set to p = 1 and the 
velocity to vanish everywhere in the domain. At the center of the domain we set the pressure P = 100 
in a spherical region with radius R - 0.1. Outside of a sphere with radius R = 0.125 we set the pressure 
to P — 1. In the spherical shell between the high central pressure and the outer low pressure regions, 
we let the pressure vary linearly between the two extremes. The magnetic field components are set as 
b x = by = 7 / V(2) and b z = 0. Inside the high pressure sphere the ratio between gas pressure and magnetic 
pressure is B = 2P/B 2 * 4 and outside it is B — 2P/B 2 * 0.04. This is a difficult test problem for codes 
evolving the total energy: The internal energy is obtained by the substraction of kinetic and magnetic energy 
from the total energy. When the energy density is locally dominated by the magnetic field, negative internal 
energies can occur which then break down the simulation. 

The simulation was stopped at t — 0.03 and we used a CFL number of 0.5. For the fluid update we 
used the minmod limiter. The results are displayed in figure [8] for the density, pressure, kinetic energy and 




Figure 5: Shaded surface plot of the density for the Sedov-Taylor blast wave in the x,y plane with z = 1/2 illustrating the spherical 
symmetric character of the numerical solution. 
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magnetic energy. As apparent from the figure, spherical symmetry is broken and one clearly distinguishes 
between the flow propagation parallel and orthogonal to the magnetic field. The outermost shell indicates 
a fast magnetosonic shock, which is only weakly compressive. The energy density is dominated by the 
magnetic field. On the inside there are two dense shells propagating parallel to the magnetic field. From the 
outside these dense shells are bound by a slow magnetosonic shock and a contact on the inside. 

Similar initial conditions can be found in the literature, see e.g. J4III42I l43ll . We have tried to run our 
code with the same initial conditions as in ref. Il44ll where the magnetic field is higher b x = b y =10/ V(2) and 
with no linear interpolation between the high and low pressure regions. However, we didn't succeed under 
reasonable CFL conditions (> 0.05) and simulation time. The appearance of negative internal energies 
always stopped the simulation. For very low ft flows we pay the price for the algorithmic simplicity of 
splitting fluid and magnetic advection. An upwind interpolation of the velocities to cell faces might help to 
cure the problem. However, we have not yet explored this possibility. 

4.5. The rotor problem 

Our next test is the so-called rotor problem, presented in ll4lll . We use the first rotor problem in l36ll as 
initial conditions. We solve this 2D problem in the Cartesian domain [0, l] 2 with 256 2 computational cells. 
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Figure 6: Radial density distribution of the spherical Riemann problem. The blue points represent a random subset of all the cells, the 
green line is an average over all the cells and the red line is the result of a highly resolved ID simulation. 




Figure 7: Shaded surface plot of the density for the spherical Riemann problem in the x, y plane with z = 1/2 illustrating the spherically 
symmetric character of the numerical solution. The density ranges from 1 (black) to 0.125 (white). 
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The problem consists of a dense rapidly rotating cylinder (the rotor) with p — 10, v x — -2(y - l/2)/ro, 
Vy = 2(x - l/2)/ro extending up to a radius ro = 0.1, where r - [(x- 1/2) 2 + (y - l/2) 2 ]^ 2 , installed in a 
lighter resting fluid. The lighter fluid is characterized by p = 1, v x = v y = for r > r\ — 0.1 15. In between 
the rotating and the light fluid ro < r < r\ we set 

P = l + 9/(r) 

v, = -2f(r)(y-l/2)/r 

v y = 2/(r)(jt-l/2)/r 

where 



This smoothes out the discontinuities and reduces initial transients. The magnetic field is initially set to 
b x — 5/ V47T, fry = b z — and the pressure is uniformly set to p — 1 . The adiabatic index used for this test is 
y=1.4. 

The simulation was run to time t = 0.15 and the results are displayed in figure [9] The dense rotating 
cylinder initially not in equilibrium has started to expand until the magnetic pressure due to field wrapping 
has stopped the expansion resulting in its oblate shape. The outer layers of the dense cylinder has lost part 
of its initial angular momentum in form of Alven waves radiating away. This braking of the magnetic rotor 
is a possible model for the angular momentum loss of collapsing gas clouds in star formation 11451 14611 . 




Figure 8: Shaded surface plot of the magnetic explosion at time t = 0.03. Density (top left) ranges from 0.24 (white) to 2.17 (black). 
Gas pressure (top right) ranges from 0.25 (white) to 15.85 (black). Magnetic energy (bottom left) ranges from 7.66 (white) to 33.50 
(black). Kinetic energy (bottom right) ranges from (white) to 17.33 (black). 
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4.6. Two dimensional MHD Riemann problem 

The next test we performed is a two dimensional Riemann problem. The initial conditions are similar to 
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and the magnetic field is uniformly b = (2, 0, 1) / "V^tt. We setup the problem on the domain [0, 0.8] 2 and 
hence L x = 0.8, L y = 0.8 with 512 2 cells. The adiabatic index is set to y — 5/3. For this test we used zeroth 
order extrapolation boundary conditions and evolved the problem to time t = 0.8 The density, pressure, 
kinetic energy and magnetic energy are displayed in figure [10] A similar test problem can also be found in 
1 4811 . Our results compare qualitatively well with the cited references. 



4.7. Steady poly trope 

As a final test, we show the performance of our HSE correction scheme on an astrophysical problem. We 
simulate a model star, a so-called poly trope [1]. These model stars are constructed in spherical symmetry 
from hydrostatic equilibrium 

dP ~ (71) 



dr 



dr 



and Poisson's equation 



1 d 



dr 



(72) 





Figure 9: Contour plot of the rotor problem at time t = 0.15. The density (top left), the gas pressure (top right), the Mach number 
(bottom left) and the magnetic pressure (bottom right). 
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where r is the radial variable and G is the gravitational constant. The functional relation between p and p is 
of the form p = Kp y where k and y are constants. This relation is called a polytropic equation of state, k is the 
polytropic constant and y the polytropic exponent not to be confused here with the adiabatic index. We chose 
the following parameters: M = 1M Q , R = 1R Q and y = 4/3. Here M is the total mass of the star in units of 
solar masses and R is the radius of the star in units of solar radii. With these parameters eq. d7TT > and d72b can 
easily be solved by any ODE integrator. The resulting star has then a polytropic constant k w 3.841 x 10 14 (in 
cgs units) and a mean density of p » 1.408 g/cm 3 . This sets the hydrostatic timescale thse ~ (pG) _1 ^ 2 /2 * 
1.631 x 10 3 s, which is the characteristic timescale on which the star reacts to perturbations of its hydrostatic 
equilibrium. We then mapped the spherical profiles of the density, internal energy and gravitational potential 
onto a 3D Cartesian grid of dimensions [0, 2.97? ] 3 such that the center of the star coincides with the center 
of the domain. We have put around the star a low density atmosphere p atm = 10~ 6 pc where pc = 76.313 
g/cm 3 is the central density. The low density atmosphere is not in hydrostatic equilibrium but since its mass 
is so low it will have little dynamic effect on the star. We adopted continuous boundary conditions. 

We then evolve the hydrodynamic variables for 20 hydrostatic timescales thse with the ideal gas law 
(0, y = 4/3, and with the unsplit source term integrator described in |2.3l The gravitational potential is kept 
fixed for the whole simulation. We have used both 64 3 and 128 3 cells to discretise the domain. 

The results of the simulation are shown in figure [TT] The subplot on the left hand side shows the spher- 
ically averaged density distribution of the simulation with 128 3 cells resolution with and without the HSE 
correction (dash dotted and solid red lines respectively). The solid blue line is the density profile as initially 
put on the 3D grid and is therefore the reference solution. The subplot on the right hand side shows the 
central density normalised to the exact value as function of time in units of the hydrostatic timescale thse- 
The dash dotted lines are once again with the HSE correction while the solid lines are without. The blue 




Figure 10: Contour plot of the 2D MHD Riemann problem at time t = 0.8. Density (top left) ranges from 0.3428 (white) to 2.5698 
(black). Gas pressure (top right) ranges from 0.1208 (white) to 0.7757 (black). Magnetic energy (bottom left) ranges from 0.0039 
(white) to 0.2953 (black). Kinetic energy (bottom right) ranges from (white) to 0.6233 (black). 
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lines have been computed with 64 cells and the red lines with 128 cells. 

Due to the fact that the center of the star is an extremum in density and internal energy, any TVD scheme 
switches there to first order accuracy. Hence the strong numerical dissipation associated with first order 
accuracy heats the center of the star which increases the pressure and as a consequence the central density is 
diminished. In other words the star evaporates. 

As can be seen from figure QT| the HSE correction effectively lowers this spurious effect. The HSE 
corrected simulation with 64 3 cells (blue dash dotted on right subplot) does even better than the simulation 
with 128 3 cells without correction. 

With increasing resolution, as one expects, the effect of the HSE correction gets less dramatic. However, 
for low to intermediately resolved strong gravitationally induced gradients the HSE correction can greatly 
improve the accuracy at the center of the star. 




Figure 11: Left: radially averaged density for the simulation with 128 3 cells with (dash dotted red line) and without (solid red line) 
HSE correction. The solid blue line is the reference solution. The HSE correction significantly increases the accuracy in the inner 
region. Right: central density normalised to the exact value as a function of time. The dash dotted and the solid lines again denote 
the simulation results with and without HSE correction. The blue lines were computed with 64 3 cells and the red ones with 128 3 . 
The central density is kept nearly at constant value for the HSE corrected simulations while without the correction the central density 
decreases monotonically. Note the initial decrease in central density for all curves which is an initial relaxation of the numerical solution 
to the employed resolution. 



5. Conclusion 

We describe a simple and efficient three-dimensional implementation of the ideal magneto-hydrodynamics 
equations in our code named fish. The algorithms used are mostly based on foil , Three different choices 
make the code very transparent and easy to work with: First, the rigorous operator splitting in second order 
operator ordering separates the sweeps across different coordinate directions. Also the constrained transport 
of the magnetic field is operator split from the hydrodynamics update. Second, the use of a Riemann solver 
free relaxation scheme with second order TVD advection facilitates the update of the state vector. Second 
order in time is achieved by a predictor-corrector approach within each time step. Third, subroutines for the 
physics operators are only required for the ^-direction because the sweeps in y-direction and z-direction are 
performed by a rotation of the computational domain. Between the sweeps, the data is rotated such that an 
x-sweep in the direction of contiguous memory actually performs a y- or z-sweep in the coordinate frame 
of the rotated data. We document the details of an improved discretisation of the advective fluxes with re- 
spect to earlier versions of the code. Due to its clarity, we have successfully used the code for exercises and 
computational experiments in magneto-hydrodynamics classes. 
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We present different tests to evaluate the accuracy of fish. Second order convergence is demonstrated 
by the example of a circularly polarised propagating Alfven wave. The capability of accurately capturing 
hydrodynamic shocks is shown in a Sedov-Taylor blast wave explosion and in a spherical Riemann problem 
that are evolved in the three-dimensional computational domain. The latter two examples produce nice 
spherical flow features and show that the directional operator splitting does not cause sizeable asymmetries. 
We also investigate strong magnetosonic shocks in the form of a magnetic explosion. For this problem 
we reached a limit of the current scheme for very low plasma /3 and very strong shocks. However, there 
is still room for improving the scheme in this extreme regimes and future developments will go in that 
direction. In order to further test the coupling between the magnetic fields and the fluid we perform the rotor 
problem in a two-dimensional computational domain. Finally, we calculate a 2D MHD Riemann problem 
and successfully compare the result with the literature. The only test that led to insufficient performance 
was the time evolution of a gravitationally bound hydrostatic object at low resolution. The problem is 
that the presence of gravitation leads to stationary density gradients that are mistaken as propagating wave 
gradients in the scheme that tailors the numerical dissipation for a total variation diminishing solution. The 
corresponding stationary dissipation leads to the evaporation of the bound object on few dynamical time 
scales. We remedied this problem by analytically estimating the hydrostatic gradients and subtracting them 
from the total gradients before calculating and applying the dissipation for the propagating waves. This to 
our knowledge novel and simple measure could also be used in combination with the Roe Riemann solver 
and potentially other approximate solvers. It allows the stable evolution of a gravitationally bound object 
over many dynamical time scales without exaggerated numerical dissipation at the center. This is shown in 
our last test case where we evolve a polytrope in hydrostatic equilibrium. 

Inspite of the straightforward approach, fish is very efficient, even for problems containing ~ 1000 3 
and more cells, fish implements cubic domain decomposition for distributed memory computer clusters. 
The process-wise rotation of the data between the sweeps takes about 10% of the computational cost, but is 
largely compensated for by the fact that the individual sweeps along contiguous memory can be pipelined in 
the cache. The rotation of the data before the directional sweeps enables a very elegant parallelisation scheme 
that is based on non-blocking persistent MPI communication directives. We perform the communication for 
each direction separately and overlap it with computations of the update of interior cells. As sweeps along the 
same direction are embarassingly parallel within a shared memory cluster, we additionally parallelise those 
using OpenMP directives. With this, fish can take advantage of multi-core processors and scales nicely to 
~ 10000 processes for a problem of fixed size. The limiting factor in the scaling of fish is the increasing 
ratio of work on buffer zones to work on enclosed zones. Even more processes can be used efficiently if the 
problem size grows with the number of available processes. 

fish is mainly targeted for astrophysical applications where simple and efficient low-order algorithms 
are appreciated to accept manifold variations of input physics. The input physics is often a patchwork 
of different descriptions and algorithms for the different domains. As they do not always connect with a 
prescribed smoothness and as decisions about approximations are often taken during runtime, it is difficult 
to work with higher order methods, fish has successfully been used in one of the first predictions of the 
graviational wave signal from 3D supernova models with microscopic input physics lfl5ll and provides the 
foundation for the implementation of spectral neutrino transport in our new code ELEPHANT (ELegant 
and Efficient Parallel Hydrodynamics with Approximate Neutrino Transport) |Hfll . The dynamical contrast 
of the density in many astrophysical applications suggests the implementation of an adaptive mesh. We 
discussed the extension of the approach to non-uniform orthogonal meshes, but find them only applicable 
for limited dynamical contrasts of about a factor of two increase of the resolution. On the long term, it 
appears more promising to embed the uniform Cartesian mesh of our current approach in a multi-patch 
driver like e.g. carpet in the cactus framework. We hope that this paper will serve as a reference for future 
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users and developers of the public version of fish, which is in preparation and will soon become available. 
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